58 research outputs found

    The proximal point method revisited

    Full text link
    In this short survey, I revisit the role of the proximal point method in large scale optimization. I focus on three recent examples: a proximally guided subgradient method for weakly convex stochastic approximation, the prox-linear algorithm for minimizing compositions of convex functions and smooth maps, and Catalyst generic acceleration for regularized Empirical Risk Minimization.Comment: 11 pages, submitted to SIAG/OPT Views and New

    The many faces of degeneracy in conic optimization

    Full text link
    Slater's condition -- existence of a "strictly feasible solution" -- is a common assumption in conic optimization. Without strict feasibility, first-order optimality conditions may be meaningless, the dual problem may yield little information about the primal, and small changes in the data may render the problem infeasible. Hence, failure of strict feasibility can negatively impact off-the-shelf numerical methods, such as primal-dual interior point methods, in particular. New optimization modelling techniques and convex relaxations for hard nonconvex problems have shown that the loss of strict feasibility is a more pronounced phenomenon than has previously been realized. In this text, we describe various reasons for the loss of strict feasibility, whether due to poor modelling choices or (more interestingly) rich underlying structure, and discuss ways to cope with it and, in many pronounced cases, how to use it as an advantage. In large part, we emphasize the facial reduction preprocessing technique due to its mathematical elegance, geometric transparency, and computational potential.Comment: 99 pages, 5 figures, 2 table

    Complexity of a Single Face in an Arrangement of s-Intersecting Curves

    Full text link
    Consider a face F in an arrangement of n Jordan curves in the plane, no two of which intersect more than s times. We prove that the combinatorial complexity of F is O(\lambda_s(n)), O(\lambda_{s+1}(n)), and O(\lambda_{s+2}(n)), when the curves are bi-infinite, semi-infinite, or bounded, respectively; \lambda_k(n) is the maximum length of a Davenport-Schinzel sequence of order k on an alphabet of n symbols. Our bounds asymptotically match the known worst-case lower bounds. Our proof settles the still apparently open case of semi-infinite curves. Moreover, it treats the three cases in a fairly uniform fashion.Comment: 9 pages, 5 figure

    Stochastic model-based minimization of weakly convex functions

    Full text link
    We consider a family of algorithms that successively sample and minimize simple stochastic models of the objective function. We show that under reasonable conditions on approximation quality and regularity of the models, any such algorithm drives a natural stationarity measure to zero at the rate O(kβˆ’1/4)O(k^{-1/4}). As a consequence, we obtain the first complexity guarantees for the stochastic proximal point, proximal subgradient, and regularized Gauss-Newton methods for minimizing compositions of convex functions with smooth maps. The guiding principle, underlying the complexity guarantees, is that all algorithms under consideration can be interpreted as approximate descent methods on an implicit smoothing of the problem, given by the Moreau envelope. Specializing to classical circumstances, we obtain the long-sought convergence rate of the stochastic projected gradient method, without batching, for minimizing a smooth function on a closed convex set.Comment: 33 pages, 4 figure

    Efficiency of minimizing compositions of convex functions and smooth maps

    Full text link
    We consider global efficiency of algorithms for minimizing a sum of a convex function and a composition of a Lipschitz convex function with a smooth map. The basic algorithm we rely on is the prox-linear method, which in each iteration solves a regularized subproblem formed by linearizing the smooth map. When the subproblems are solved exactly, the method has efficiency O(Ξ΅βˆ’2)\mathcal{O}(\varepsilon^{-2}), akin to gradient descent for smooth minimization. We show that when the subproblems can only be solved by first-order methods, a simple combination of smoothing, the prox-linear method, and a fast-gradient scheme yields an algorithm with complexity O~(Ξ΅βˆ’3)\widetilde{\mathcal{O}}(\varepsilon^{-3}). The technique readily extends to minimizing an average of mm composite functions, with complexity O~(m/Ξ΅2+m/Ξ΅3)\widetilde{\mathcal{O}}(m/\varepsilon^{2}+\sqrt{m}/\varepsilon^{3}) in expectation. We round off the paper with an inertial prox-linear method that automatically accelerates in presence of convexity

    Graphical Convergence of Subgradients in Nonconvex Optimization and Learning

    Full text link
    We investigate the stochastic optimization problem of minimizing population risk, where the loss defining the risk is assumed to be weakly convex. Compositions of Lipschitz convex functions with smooth maps are the primary examples of such losses. We analyze the estimation quality of such nonsmooth and nonconvex problems by their sample average approximations. Our main results establish dimension-dependent rates on subgradient estimation in full generality and dimension-independent rates when the loss is a generalized linear model. As an application of the developed techniques, we analyze the nonsmooth landscape of a robust nonlinear regression problem.Comment: 36 page

    Complexity of finding near-stationary points of convex functions stochastically

    Full text link
    In a recent paper, we showed that the stochastic subgradient method applied to a weakly convex problem, drives the gradient of the Moreau envelope to zero at the rate O(kβˆ’1/4)O(k^{-1/4}). In this supplementary note, we present a stochastic subgradient method for minimizing a convex function, with the improved rate O~(kβˆ’1/2)\widetilde O(k^{-1/2}).Comment: 9 page

    Error bounds, quadratic growth, and linear convergence of proximal methods

    Full text link
    The proximal gradient algorithm for minimizing the sum of a smooth and a nonsmooth convex function often converges linearly even without strong convexity. One common reason is that a multiple of the step length at each iteration may linearly bound the "error" -- the distance to the solution set. We explain the observed linear convergence intuitively by proving the equivalence of such an error bound to a natural quadratic growth condition. Our approach generalizes to linear convergence analysis for proximal methods (of Gauss-Newton type) for minimizing compositions of nonsmooth functions with smooth mappings. We observe incidentally that short step-lengths in the algorithm indicate near-stationarity, suggesting a reliable termination criterion.Comment: 35 page

    Semi-algebraic functions have small subdifferentials

    Full text link
    We prove that the subdifferential of any semi-algebraic extended-real-valued function on Rn\R^n has nn-dimensional graph. We discuss consequences for generic semi-algebraic optimization problems.Comment: 21 pages, 1 figure, Accepted for publication in Mathematical Programming, Ser.

    The nonsmooth landscape of phase retrieval

    Full text link
    We consider a popular nonsmooth formulation of the real phase retrieval problem. We show that under standard statistical assumptions, a simple subgradient method converges linearly when initialized within a constant relative distance of an optimal solution. Seeking to understand the distribution of the stationary points of the problem, we complete the paper by proving that as the number of Gaussian measurements increases, the stationary points converge to a codimension two set, at a controlled rate. Experiments on image recovery problems illustrate the developed algorithm and theory.Comment: 42 Pages, 15 figure
    • …
    corecore